TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations

نویسندگان

  • Nestor Alvaro
  • Yusuke Miyao
  • Nigel Collier
چکیده

BACKGROUND Work on pharmacovigilance systems using texts from PubMed and Twitter typically target at different elements and use different annotation guidelines resulting in a scenario where there is no comparable set of documents from both Twitter and PubMed annotated in the same manner. OBJECTIVE This study aimed to provide a comparable corpus of texts from PubMed and Twitter that can be used to study drug reports from these two sources of information, allowing researchers in the area of pharmacovigilance using natural language processing (NLP) to perform experiments to better understand the similarities and differences between drug reports in Twitter and PubMed. METHODS We produced a corpus comprising 1000 tweets and 1000 PubMed sentences selected using the same strategy and annotated at entity level by the same experts (pharmacists) using the same set of guidelines. RESULTS The resulting corpus, annotated by two pharmacists, comprises semantically correct annotations for a set of drugs, diseases, and symptoms. This corpus contains the annotations for 3144 entities, 2749 relations, and 5003 attributes. CONCLUSIONS We present a corpus that is unique in its characteristics as this is the first corpus for pharmacovigilance curated from Twitter messages and PubMed sentences using the same data selection and annotation strategies. We believe this corpus will be of particular interest for researchers willing to compare results from pharmacovigilance systems (eg, classifiers and named entity recognition systems) when using data from Twitter and from PubMed. We hope that given the comprehensive set of drug names and the annotated entities and relations, this corpus becomes a standard resource to compare results from different pharmacovigilance studies in the area of NLP.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions

MOTIVATION Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the "assumed average". Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic va...

متن کامل

Review of “Twitter and Jihad: The Communication Strategy of ISIS” edited by Monica Maggioni and Paolo Magri

Twitter and Jihad: The Communication Strategy of ISIS edited by Monica Maggioni & Paolo Magri. Milan, Italy: ISPI, 2015. 168pp., $10 (p/b), ISBN 978-88-98014-66-8

متن کامل

A Review of the Effects of Various Venotonics on Improvement of Postoperative Symptoms

There are evidences on the efficacy of several venotonics in improving postoperative symptoms, including bleeding, pain, etc. A thorough search was conducted in Google Scholar, PubMed, and Cochrane Library covering the articles published in 2000–2020. We included trials assessing the efficacy of phlebotonics in patients with chronic venous insufficiency and other venous diseases or traumas. Fin...

متن کامل

Annotation and Extraction of Relations from Italian Medical Records

We address the problem of extracting knowledge from large scale clinical records written in Italian by physicians. We perform recognition of relevant entities such as symptoms, diseases, treatments, measurements, drugs and so forth, and then we determine their semantic relations. We developed suitable training corpora in order to apply machine learning techniques to this task. We report on expe...

متن کامل

A Corpus Study for Identifying Evidence on Microblogs

Microblogs are a popular way for users to communicate and have recently caught the attention of researchers in the natural language processing (NLP) field. However, regardless of their rising popularity, little attention has been given towards determining the properties of discourse relations for the rapid, large-scale microblog data. Therefore, given their importance for various NLP tasks, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2017